Loop Fusion in High Performance Fortran Loop Fusion in High Performance Fortran

نویسندگان

  • Gerald Roth
  • Ken Kennedy
چکیده

In this paper we investigate a unique problem associated with fusing loops within a High Performance Fortran (HPF) program. In particular, we discuss the issue of performing loop fusion in an HPF compiler when compiling Fortran90 array assignment statements for execution on a distributed-memory machine. During compilation of an HPF program, Fortran90 array assignment statements must be scalarized into loop nests. We show how a certain class of these loop nests, when fused, can cause problems for the compiler's distributed-memory code generator. We then present an algorithm which not only prevents the fusion of these loops, but also increases the amount of useful fusion that can be performed. 1 Introduction High-Performance Fortran (HPF))12], an extension of For-tran90, has attracted considerable attention as a promising language for writing portable parallel programs. HPF ooers a simple programming model shielding programmers from the intricacies of concurrent programming and managing distributed data. Programmers express data parallelism using Fortran90 array operations and use data layout directives to direct partitioning of the data and computation among the processors of a parallel machine. One transformation an HPF compiler must address is the scalarization of the Fortran90 array operations into serial DO-loops. Scalarization is often followed by loop fusion in an attempt to improve a program's data locality and data reuse characteristics. Unfortunately, the fusion of some of these scalarized loops can produce loops for which it is dif-cult to generate an eecient SPMD program. However, loop fusion is too important of an optimization to simply disable. To solve this dilemma, we have developed an algo

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic selection of high-order transformations in the IBM XL FORTRAN compilers

The IBM ASTl optimizer provides the foundation for high-order transformations and automatic shared-memory parallelization in the latest IBM XL FORTRAN (XLF) compilers for RS/6000'" and PowerPC@ uniprocessors and symmetric multiprocessors (SMPs), and for automatic distributed-memory parallelization in the IBM XL High-Performance FORTRAN (XLHPF) compiler for the SP' " distributed-memory multiproc...

متن کامل

The Bouclettes Loop Parallelizer Ecole Normale Supérieure De Lyon the Bouclettes Loop Parallelizer

Bouclettes is a source to source loop nest parallelizer It takes as input Fortran uniform perfectly nested loops and gives as output an HPF High Performance Fortran program with data distribution and parallel HPF INDEPENDENT loops This paper presents the tool and the underlying parallelization methodology

متن کامل

Design and optimisation of scientific programs in a categorical language

This thesis presents an investigation into the use of advanced computer languages for scientific computing, an examination of performance issues that arise from using such languages for such a task, and a step toward achieving portable performance from compilers by attacking these problems in a way that compensates for the complexity of and differences between modern computer architectures. The...

متن کامل

Bouclettes: A Fortran Loop Parallelizer

High Performance Fortran is a dataparallel language that allows the user to specify the parallelism in his program. It is not always easy to extract the parallelism in a given program. To help the user, an automatic loop parallelizer has been developed : Bouclettes. Bouclettes has been written to validate some scheduling and mapping techniques that are mentioned in this paper. A Fortran 77 loop...

متن کامل

Code generation in bouclettes

Bouclettes is a source to source loop nest parallelizer. It takes as an input Fortran uniform, perfectly nested loops and gives as an output an equivalent High Performance Fortran program with data distribution directives and parallel ($HPF! INDEPENDENT) loops. This paper explains how the HPF program is built from a “shifted linear schedule” and a data allocation. We focus on the problems we ha...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998